Cream of the Crop 1

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 1 / Cream of the Crop 1.iso / PROGRAM / SED15.ARJ / SED.LST < prev next >

Wrap

File List | 1991-09-22 | 25KB | 661 lines

SED(1) USER DOCUMENTATION SED(1) NAME sed - the stream editor SYNOPSIS sed [-n] [-g] [-e script] [-f sfilename] [filename ] DESCRIPTION sed reads each filename line by line, edits each line according to a script of commands as specified by the -e and -f arguments and then copies the edited line to the standard output. OPTIONS The -e option supplies a single edit command from the next argument; if there are several of these they are executed in the order in which they appear. If there is just one -e option and no -f's, the -e flag may be omitted. An -f option causes commands to be taken from the file sfilename; if there are several of these they are executed in the order in which they appear; -e and -f commands may be mixed. The script or sfilename can be adjacent to the -e or -f or can be the next argument on the command line. The -g option causes sed to act as though every substitute command in the following script has a g suffix. The -n option suppresses the default output. SCRIPTS A script consists of one or more sed commands of the following form: [address[,address]] function [arguments] Normally sed cyclically copies a line of input into a current text buffer, then applies in sequence all commands whose addresses select that line and then copies the buffer to standard output and clears the buffer. The -n option suppresses normal output so that only commands which do output (e.g. p) cause any writing to occur. Also, some commands (n, N) do their own line reads, and some others (o, d, D) cause all commands following in the script to be skipped (the D command also suppresses the clearing of the current text buffer that would normally occur before the next cycle). There is also a second buffer (called the 'hold space' that can be copied or appended to or from or swapped with the current text buffer. ADDRESSES An address is: a decimal number (which matches that numbered line where line numbers start at 1 and run cumulatively across files), or a '$' (which matches the last line of input), or a '/regular expression/' (which matches any line satisfying the expression. The following rules govern the address matching: * A command line with no addresses selects every input line. * A command line with one address selects every input line that matches that address. * A command line with two addresses selects the inclusive range from the first input line that matches the first address up to and including the next input that matches the second. (If the second address is a number less than or equal to the line number first selected, only one line is selected.) Once the second address is matched sed starts h**2 Documentation 21 September 1991 1 SED(1) USER DOCUMENTATION SED(1) looking for the first one again; thus, any number of these ranges will be matched. * The second address may be in the form of '+number'. This means that the command will stay selected for number lines after the first address is satisfied. * \?regular expression? where ? is any character is identical to /regular expression/. * The negation operator '!' preceding a function makes that function apply to every line not selected by the address(es). FUNCTIONS In the following list of functions, the maximum number of addresses permitted for each function is indicated in parentheses. An argument denoted 'text' consists of one or more lines, with all but the last ending with '\' to hide the newline. A command with this type argument must be the last on any command line or -e argument. Otherwise multiple commands may appear on a line separated by ';' characters. A command may have a trailing comment indicated by a '#' character. Comment lines begin with a '#'. Backslashes in text are treated as described in 'escape sequences below; they may be used to protect initial whitespace against the stripping that is done on every line of the script. An argument denoted 'label', 'rfile' or 'wfile' (which specify labels or file names) is not processed for 'escape sequences'. Therefore a ';' or '#' terminates the label or file name. This simplifies entering DOS style paths. Each 'wfile' is created before processing begins. There can be at most 10 distinct 'wfile' arguments. (1) a text Append the 'text' on output before reading the next input line. (2) b [label] Branch to the ':' command with the given 'label'. If no 'label' is given, branch to the end of the script. (2) c text Change lines by deleting the current text buffer and at the end of the address range, place 'text' on the output. Start the next input cycle. (2) d Delete the current text buffer. Start the next input cycle. (2) D Delete the first line of the current text buffer (all characters up to the first newline). Start the next input cycle. (2) g Replace the contents of the current text buffer with the contents of the hold space. (2) G Append the contents of the hold space to the current text buffer. (2) h Copy the current text buffer into the hold space. h**2 Documentation 21 September 1991 2 SED(1) USER DOCUMENTATION SED(1) (2) H Append a copy of the current text buffer to the hold space. (1) i text Insert the 'text' on the standard output. (2) l [w[file]] List current text buffer on standard output or to a file if the -w option follows. Non ASCII printable characters are expanded as shown in the 'escape sequence' section below. (2) n Copy the current text buffer to standard output. Read the next line of input into it. The current line number changes. (2) N Append the next line of input to the current text buffer, inserting an embedded newline between the two. The current line number changes. (2) p Copy the current text buffer to the standard output. (2) P Copy the first line of the current text buffer (all characters up to the first newline) to standard output. (1) q Quit. Perform any pending outputs (a or r commands) and terminate sed. (1) r rfile Read the contents of 'rfile'. Place them on the output before reading the next input line. (2) s /regular expression/replacement/flags Substitute the 'replacement' for instances of the 'regular expression' in the current text buffer. Any character may be used instead of '/'. In the 'regular expression' and in the 'replacement' text \1 - \9 are used to indicate the nth subexpression indicated by a '$...$' expression in the 'regular expression'. In the replacement text an & may be used to indicate the entire matched expression. If the replacement text consists only of the a single '%' character, then a copy of the replacement text for the previous s command is used as the replacement text for this command. 'Flags' are any of the following options, with the following provisos: if present w must be the last one; only the last of either p or P is used; and only the last 'n} is used. g - Global. Substitute for all nonoverlapping instances of the 'RE' rather than just the first one. p - Print the current text buffer if a replacement was made. P - Print the first line of the current text buffer if a replacement was made. h**2 Documentation 21 September 1991 3 SED(1) USER DOCUMENTATION SED(1) w[wfile] - Append the current text buffer to the file argument as in a w command if a replacement is made. Standard output is used if no file argument is given. n - Where n can be 1 through 512. Perform only the nth replacement. If g is also set or the -g option is selected, this option means that the nth and all succeeding substitutions should be performed. (2) t [label] Branch to the ':' command with the given 'label' if any s commands made any substitutions since the most recent read of an input line or execution of a t or T. If no 'label' is given, branch to the end of the script. (2) T [label] Branch to the ':' command with the given 'label' if no s commands have succeeded since the last input line or t or T command. Branch to the end of the script if no 'label' is given. (2) w [wfile] Write the current text buffer to 'wfile'. If no 'wfile' is given standard output is used. (2) W [wfile] Write the first line of the current text buffer to 'wfile'. If no 'wfile' is given standard output is used. (2) x Exchange the contents of the current text buffer and hold space. (2) y /string1/string2/ Translate. Replace each occurrence of a character in string1 with the corresponding character in string2. The '/' may be any character not in 'string1' or 'string2'. The lengths of the two strings must be equal. (2) ! function All-but. Apply the function (or group, if function is '{') only to lines not selected by the address(es). (0) : label This command defines a label for b T and t commands. (1) = Write a line containing the current line number to the standard output. (2) { Execute the following commands through a matching '}' only when the current line matches the address or address range given. (0) } The { command marks the end of a grouping started by a '{'. (0) An empty command is ignored. h**2 Documentation 21 September 1991 4 SED(1) USER DOCUMENTATION SED(1) ESCAPE SEQUENCES The following escape sequences are used to represent unprintable characters in 'text', 'regular expressions' and 'replacement' text. It is ignored in 'labels' and 'file's. If the character following the '\ 'is not list below the '\' causes the character to be quoted during script input. The l command also uses this convention. \a - bell (ASCII 07) \b - backspace (ASCII 08) \e - escape (ASCII 27) \f - formfeed (ASCII 12) \n - newline (ASCII 10) \r - return (ASCII 13) \t - tab (ASCII 09) \v - verticaltab(ASCII 11) \xhh - the ASCII character corresponding to 2 hex digits hh. \\ - the backslash itself. REGULAR EXPRESSIONS ('REs') Regular expressions can be built up from the following "single-character" 'RE's: c Any ordinary character not listed below. An ordinary character matches itself. \ Backslash. When followed by a special character the 'RE' matches the "quoted character" as listed in 'Escape Sequences' above. A backslash followed by one of <,>,(,),{,} or 0...9 represents an 'operator' in a regular expression, as described below. . Dot. Matches any single character except the NEWLINE at the end of a line. ^ Carat. As the leftmost character in an 'RE' this constrains the pattern to be an anchored match. That is it must match anchored at the first character in the line. In any other position the ^ is an ordinary character. $ The dollar sign as the rightmost character in an 'RE' matches the NEWLINE at the end of the line. At any other position the $ is an ordinary character. ^RE$ This requires the 'RE' to match the entire buffer. [c...] A nonempty string of characters enclosed by square brackets matches any single character in the string except the NEWLINE at the end of the string. If the first character of the string is a caret (^), then the 'RE' matches any character not in the string except the NEWLINE at the end of the string. A '-' sign may be used to express ranges of characters. For example the range '[0- 9]' is equivalent to the string '[0123456789]'. The '-' is treated as an ordinary character if it occurs in the string at a position that can not be part of a range. This construct is called 'set definition'. h**2 Documentation 21 September 1991 5 SED(1) USER DOCUMENTATION SED(1) \{m\} \{m,\} \{m,n\\} When any of these constructs follow an ordinary character, a dot, a 'set definition' or the '\n' construct. This construct matches the previous construct for a range of occurrences. At least m occurrences will be matched and at most n. "\{m,\}" matches at least m occurrences and "\{m}" matches exactly m. * When this follows an ordinary character, a dot, a 'set definition' or the '\n' construct, this 'RE' matches 0 or more occurrences of that construct. This pattern is called a 'closure'. + This pattern is similar to the star above but matches one or more occurrences of the previous construct. \< The sequence \< in an 'RE' requires that the scan position in the line must be immediately following a character that can not be part of a "word" and immediately preceding a character that can be part of a "word". In this context a "word" is any sequence of upper and lowercase letters, a numeral [0-9] or the underscore character (_). \> The sequence \> in an 'RE' requires that the scan position in the line must be immediately following a character that can be part of a "word" and immediately preceding a character that can not be part of a "word". $...$ An 'RE' enclosed between the character sequences $ and $ matches whatever the unadorned 'RE' matches, but saves the string matched by the enclosed RE in a numbered substring register. There can be up to nine such substrings in an 'RE', and the parenthesis operators can be nested. \n Match the contents on the nth substring register. When nested substrings are present, 'n' is determined by counting the occurrences of \( starting from the left. // The empty 'RE' (//) is equivalent to the last 'RE' encountered in the input processing. ERROR MESSAGES The following error messages may appear during the compilation phase of sed processing all cause sed to terminate: sed: bad expression 'hh' -- The escape sequence of "\x" did was not followed by two hex digits sed: bad value for match count on s command 'command' -- A maximum value of 512 is allowed for 'n' on an s command. sed: cannot create 'file' -- The listed output file could not be opened h**2 Documentation 21 September 1991 6 SED(1) USER DOCUMENTATION SED(1) sed: cannot open command-file 'file' -- The 'file' on an -f argument could not be opened sed: command "command" has trailing garbage -- Command was not terminated properly sed: duplicate label 'label' -- The indicated 'label' appeared on more than one ':' command sed: error processing: 'argument' -- The 'argument' is incorrect either a file name was missing or the g or n options had trailing garbage sed: garbled address 'command' -- Improper 'regular expression' in an address, line number in an address or + used in first address sed: garbled command 'command' -- Error in the construction of the 'regular expression' or 'replacement' in an s command, an ill-formed y command or a 'null' character was found sed: no addresses allowed for 'command -- The end of group (}) and label command (:), can not have addresses sed: no argument for -e -- The -e option did not have a 'script' sed: no such command as 'command' -- The function in the 'command' was illegal sed: only one address allowed for 'command' -- The a i q r and = commands allow only one address sed: range error in set 'command -- A [...'x'-'y'...] construct was found where y<x sed: RE too long: 'command' -- Internal buffer overflow while processing a 'character set' sed: too many commands, last was 'command' -- A maximum of 200 commands are allowed sed: too many labels: 'command' -- A maximum of 50 labels are allowed sed: too many line numbers 'command' -- More than 256 different line numbers were used or more than 50 + addresses were used sed: too many w files 'command' -- A maximum of 10 output files is allowed sed: too many {'s 'command' -- A '{' command did not have a matching '}' command sed: too many }'s 'command' -- A '}' command appeared before an opening '{' command sed: too much text: 'command' -- The internal command text buffer overflowed processing the command h**2 Documentation 21 September 1991 7 SED(1) USER DOCUMENTATION SED(1) sed: undefined label 'label' -- The listed 'label' was never defined on a : command sed: unknown flag 'option' -- The listed 'option' is not allowed on the invoking line for sed The following warning may be displayed during compilation: sed: Label not used 'label' -- The listed 'label' was defined but never referenced. During the actual editing the following fatal errors can occur: sed: append too long after line 'number' -- An A, G, H, or N command created a line in the buffer longer than 4000 characters sed: cannot open 'file' -- The r command could not open 'file' sed: infinite branch loop at line 'number' -- More than 50 branches were taken without the editing of the line completing sed: line too long at line 'number' -- While the s command was performing a substitution the line length exceeded 4000 characters sed: RE bad code 'code' -- An internal processing error has occurred while matching a 'regular expression' sed: too many appends after line 'number' -- An append command caused more than 20 reads and appends for the given line sed: too many reads after line 'number' -- A read command caused more than 20 reads and appends for the give line BUGS I tried to fix every problem I could find, but I believe the follow bugs still exist in this version: * The getline routine can overflow the buffer before checking for overflow * I still do not know exactly what the D command should do * Strange options on the s command are allowed * The handling of inrange for '{' commands * Error processing could be improved * All output files are overwritten even when there are errors h**2 Documentation 21 September 1991 8 SED(1) USER DOCUMENTATION SED(1) COMPATIBILITY This version of sed is a modification of the Internet supplied GNU version. That version was reverse-engineered from BSD 4.1 UNIX sed. The following changes, modifications and improvements have been made: * There is no hidden length limit (40 in BSD sed) on 'wfile' names. * There is no limit (8 in BSD sed) on the length of 'labels'. * The exchange command now works for long pattern and hold spaces. * 'Escape sequences' are inhibited for both 'label's and 'filename's. * All commands not having a 'text' argument can be separated by ";" or can have trailing comments (#) * a, c and i commands don't insist on a leading backslash '\n' in the text. * r and w commands do not insist on whitespace before the filename. * The g, P, p and 'n' options on s commands may be given in any order. * Escape sequences are valid in all contexts except file names and labels. * The full range of characters are allowed all 256 values. * In an 'RE', '+' calls for 1...n repeats of the previous pattern. * The l command produces a different format than the UNIX sed. * The W command (write first line of pattern space to file). * The T command (branch on last substitution failed). * sed's error messages have been made more specific and informative and cause processing to halt. * + allowed in the second address. * The empty RE "//" is allowed as a first address if a previous RE has been compiled * The -e and -f command line options do not require their arguments to be separate options. * If no arguments are given sed prints usage data. * In all contexts a blank file name means 'stdout'. This version otherwise appears to be equivalent to the UNIX version on the Sun4 computer. That is I believe anything that sed did on that system, this version of sed will do the same on either a Sun4 or a PC under DOS. If h**2 Documentation 21 September 1991 9 SED(1) USER DOCUMENTATION SED(1) anyone can really explain what sed is really supposed to do as explained in the UNIX documentation I would appreciate the information. The manual page refers to ed for further details which is ambiguous at best and the description I read for either the D command or the options for the s command were not understandable by me. I would appreciate any comments, suggestions and even bug reports. I some- times can be reached on INTERNET (the fiscal year is almost over), but you can always contact me by mail or phone Howard Helman (helman@elm.sdd.trw.com) Box 340 Manhattan Beach, CA 90266 213.372.5387 or after 11/1/91 310.372.538 h**2 Documentation 21 September 1991 10